52 research outputs found
Collagen Fiber Regulation in Human Pediatric Aortic Valve Development and Disease
Congenital aortic valve stenosis (CAVS) affects up to 10% of the world population without medical therapies to treat the disease. New molecular targets are continually being sought that can halt CAVS progression. Collagen deregulation is a hallmark of CAVS yet remains mostly undefined. Here, histological studies were paired with high resolution accurate mass (HRAM) collagen-targeting proteomics to investigate collagen fiber production with collagen regulation associated with human AV development and pediatric end-stage CAVS (pCAVS). Histological studies identified collagen fiber realignment and unique regions of high-density collagen in pCAVS. Proteomic analysis reported specific collagen peptides are modified by hydroxylated prolines (HYP), a post-translational modification critical to stabilizing the collagen triple helix. Quantitative data analysis reported significant regulation of collagen HYP sites across patient categories. Non-collagen type ECM proteins identified (26 of the 44 total proteins) have direct interactions in collagen synthesis, regulation, or modification. Network analysis identified BAMBI (BMP and Activin Membrane Bound Inhibitor) as a potential upstream regulator of the collagen interactome. This is the first study to detail the collagen types and HYP modifications associated with human AV development and pCAVS. We anticipate that this study will inform new therapeutic avenues that inhibit valvular degradation in pCAVS and engineered options for valve replacement
Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts
To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA (âSemantic Extraction using a Neural Network Architectureâ), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, cooccurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches
TAPCHA: An Invisible CAPTCHA Scheme
TAPCHA is a universal CAPTCHA scheme designed for touch-enabled smart devices such as
smartphones, tablets and smartwatches. The main difference between TAPCHA and other
CAPTCHA schemes is that TAPCHA retains its security by making the CAPTCHA test âinvisibleâ for
the bot. It then utilises context effects to maintain the readability of the instruction for human users
which eventually guarantees the usability of the scheme. Two reference designs, namely TAPCHA
SHAPE & SHADE and TAPCHA MULTI are developed to demonstrate the use of this scheme
Recommended from our members
UA-KO at SemEval-2022 Task 11: Data Augmentation and Ensembles for Korean Named Entity Recognition
This paper presents the approaches and systems of the UA-KO team for the Korean portion of SemEval-2022 Task 11 on Multilingual Complex Named Entity Recognition. We fine-tuned Korean and multilingual BERT and RoBERTA models, conducted experiments on data augmentation, ensembles, and task-adaptive pretraining. Our final system ranked 8th out of 17 teams with an F1 score of 0.6749 F1. © 2022 Association for Computational Linguistics.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Recommended from our members
Detection of Puffery on the English Wikipedia
On Wikipedia, an online crowdsourced encyclopedia, volunteers enforce the encyclopediaâs editorial policies. Wikipediaâs policy on maintaining a neutral point of view has inspired recent research on bias detection, including âweasel wordsâ and âhedgesâ. Yet to date, little work has been done on identifying âpuffery,â phrases that are overly positive without a verifiable source. We demonstrate that collecting training data for this task requires some care, and construct a dataset by combining Wikipedia editorial annotations and information retrieval techniques. We compare several approaches to predicting puffery, and achieve 0.963 f1 score by incorporating citation features into a RoBERTa model. Finally, we demonstrate how to integrate our model with Wikipediaâs public infrastructure to give back to the Wikipedia editor community. © 2021 Association for Computational Linguistics.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Recommended from our members
Triplet-Trained Vector Space and Sieve-Based Search Improve Biomedical Concept Normalization
Concept normalization, the task of linking textual mentions of concepts to concepts in an ontology, is critical for mining and analyzing biomedical texts. We propose a vector-space model for concept normalization, where mentions and concepts are encoded via transformer networks that are trained via a triplet objective with online hard triplet mining. The transformer networks refine existing pre-trained models, and the online triplet mining makes training efficient even with hundreds of thousands of concepts by sampling training triples within each mini-batch. We introduce a variety of strategies for searching with the trained vector-space model, including approaches that incorporate domain-specific synonyms at search time with no model retraining. Across five datasets, our models that are trained only once on their corresponding ontologies are within 3 points of state-of-the-art models that are retrained for each new domain. Our models can also be trained for each domain, achieving new state-of-the-art on multiple datasets. © 2021 Association for Computational LinguisticsOpen access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Topic model analysis of metaphor frequency for psycholinguistic stimuli
Psycholinguistic studies of metaphor processing must control their stimuli not just for word frequency but also for the frequency with which a term is used metaphorically. Thus, we consider the task of metaphor frequency estimation, which predicts how often target words will be used metaphorically. We develop metaphor classifiers which represent metaphorical domains through Latent Dirichlet Allocation, and apply these classifiers to the target words, aggregating their decisions to estimate the metaphorical frequencies. Training on only 400 sentences, our models are able to achieve 61.3 % accuracy on metaphor classification and 77.8 % accuracy on HIGH vs. LOW metaphorical frequency estimation
Recommended from our members
Domain adaptation in practice: Lessons from a real-world information extraction pipeline
Advances in transfer learning and domain adaptation have raised hopes that once-challenging NLP tasks are ready to be put to use for sophisticated information extraction needs. In this work, we describe an effort to do just that â combining state-of-the-art neural methods for negation detection, document time relation extraction, and aspectual link prediction, with the eventual goal of extracting drug timelines from electronic health record text. We train on the THYME colon cancer corpus and test on both the THYME brain cancer corpus and an internal corpus, and show that performance of the combined systems is unacceptable despite good performance of individual systems. Although domain adaptation shows improvements on each individual system, the model selection problem is a barrier to improving overall pipeline performance.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
- âŠ